62 research outputs found
Beyond position weight matrices: nucleotide correlations in transcription factor binding sites and their description
The identification of transcription factor binding sites (TFBSs) on genomic
DNA is of crucial importance for understanding and predicting regulatory
elements in gene networks. TFBS motifs are commonly described by Position
Weight Matrices (PWMs), in which each DNA base pair independently contributes
to the transcription factor (TF) binding, despite mounting evidence of
interdependence between base pairs positions. The recent availability of
genome-wide data on TF-bound DNA regions offers the possibility to revisit this
question in detail for TF binding {\em in vivo}. Here, we use available fly and
mouse ChIPseq data, and show that the independent model generally does not
reproduce the observed statistics of TFBS, generalizing previous observations.
We further show that TFBS description and predictability can be systematically
improved by taking into account pairwise correlations in the TFBS via the
principle of maximum entropy. The resulting pairwise interaction model is
formally equivalent to the disordered Potts models of statistical mechanics and
it generalizes previous approaches to interdependent positions. Its structure
allows for co-variation of two or more base pairs, as well as secondary motifs.
Although models consisting of mixtures of PWMs also have this last feature, we
show that pairwise interaction models outperform them. The significant pairwise
interactions are found to be sparse and found dominantly between consecutive
base pairs. Finally, the use of a pairwise interaction model for the
identification of TFBSs is shown to give significantly different predictions
than a model based on independent positions
Uncovering the fragility of large-scale engineering project networks
Engineering projects are notoriously hard to complete on-time, with project
delays often theorised to propagate across interdependent activities. Here, we
use a novel dataset consisting of activity networks from 14 diverse,
large-scale engineering projects to uncover network properties that impact
timely project completion. We provide the first empirical evidence of the
infectious nature of activity deviations, where perturbations in the delivery
of a single activity can impact up to 4 activities downstream, leading to large
perturbation cascades. We further show that perturbation clustering
significantly affects project overall delays. Finally, we find that poorly
performing projects have their highest perturbations in high reach nodes, which
can lead to largest cascades, while well performing projects have perturbations
in low reach nodes, resulting in localised cascades. Altogether, these findings
pave the way for a network-science framework that can materially enhance the
delivery of large-scale engineering projects.Comment: 13 pages, 3 figures, 7 supplementary figure
Recommended from our members
The Ca2+ transient as a feedback sensor controlling cardiomyocyte ionic conductances in mouse populations.
Conductances of ion channels and transporters controlling cardiac excitation may vary in a population of subjects with different cardiac gene expression patterns. However, the amount of variability and its origin are not quantitatively known. We propose a new conceptual approach to predict this variability that consists of finding combinations of conductances generating a normal intracellular Ca2+ transient without any constraint on the action potential. Furthermore, we validate experimentally its predictions using the Hybrid Mouse Diversity Panel, a model system of genetically diverse mouse strains that allows us to quantify inter-subject versus intra-subject variability. The method predicts that conductances of inward Ca2+ and outward K+ currents compensate each other to generate a normal Ca2+ transient in good quantitative agreement with current measurements in ventricular myocytes from hearts of different isogenic strains. Our results suggest that a feedback mechanism sensing the aggregate Ca2+ transient of the heart suffices to regulate ionic conductances
Quantifying the rise and fall of scientific fields
Science advances by pushing the boundaries of the adjacent possible. While
the global scientific enterprise grows at an exponential pace, at the
mesoscopic level the exploration and exploitation of research ideas is
reflected through the rise and fall of research fields. The empirical
literature has largely studied such dynamics on a case-by-case basis, with a
focus on explaining how and why communities of knowledge production evolve.
Although fields rise and fall on different temporal and population scales, they
are generally argued to pass through a common set of evolutionary stages. To
understand the social processes that drive these stages beyond case studies, we
need a way to quantify and compare different fields on the same terms. In this
paper we develop techniques for identifying scale-invariant patterns in the
evolution of scientific fields, and demonstrate their usefulness using 1.5
million preprints from the arXiv repository covering 175 research fields
spanning Physics, Mathematics, Computer Science, Quantitative Biology and
Quantitative Finance. We show that fields consistently follows a rise and fall
pattern captured by a two parameters right-tailed Gumbel temporal distribution.
We introduce a field-specific rescaled time and explore the generic properties
shared by articles and authors at the creation, adoption, peak, and decay
evolutionary phases. We find that the early phase of a field is characterized
by the mixing of cognitively distant fields by small teams of interdisciplinary
authors, while late phases exhibit the role of specialized, large teams
building on the previous works in the field. This method provides foundations
to quantitatively explore the generic patterns underlying the evolution of
research fields in science, with general implications in innovation studies.Comment: 18 pages, 4 figures, 8 SI figure
Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype
International audienceThousands of long intergenic non-coding RNAs (lincRNAs) are encoded by the mammalian genome. However, the function of most of these lincRNAs has not been identified in vivo. Here, we demonstrate a role for a novel lincRNA, linc-MYH, in adult fast-type myofiber specialization. Fast myosin heavy chain (MYH) genes and linc-MYH share a common enhancer, located in the fast MYH gene locus and regulated by Six1 homeoproteins. linc-MYH in nuclei of fast-type myofibers prevents slow-type and enhances fast-type gene expression. Functional fast-sarcomeric unit formation is achieved by the coordinate expression of fast MYHs and linc-MYH, under the control of a common Six-bound enhancer
Six1 homeoprotein drives myofiber type IIA specialization in soleus muscle
International audienceAbstractBackgroundAdult skeletal muscles are composed of slow and fast myofiber subtypes which each express selective genes required for their specific contractile and metabolic activity. Six homeoproteins are transcription factors regulating muscle cell fate through activation of myogenic regulatory factors and driving fast-type gene expression during embryogenesis.ResultsWe show here that Six1 protein accumulates more robustly in the nuclei of adult fast-type muscles than in adult slow-type muscles, this specific enrichment takes place during perinatal growth. Deletion of Six1 in soleus impaired fast-type myofiber specialization during perinatal development, resulting in a slow phenotype and a complete lack of Myosin heavy chain 2A (MyHCIIA) expression. Global transcriptomic analysis of wild-type and Six1 mutant myofibers identified the gene networks controlled by Six1 in adult soleus muscle. This analysis showed that Six1 is required for the expression of numerous genes encoding fast-type sarcomeric proteins, glycolytic enzymes and controlling intracellular calcium homeostasis. Parvalbumin, a key player of calcium buffering, in particular, is a direct target of Six1 in the adult myofiber.ConclusionsThis analysis revealed that Six1 controls distinct aspects of adult muscle physiology in vivo, and acts as a main determinant of fast-fiber type acquisition and maintenance
iGEM: a model system for team science and innovation
Teams are a primary source of innovation in science and technology. Rather
than examining the lone genius, scholarly and policy attention has shifted to
understanding how team interactions produce new and useful ideas. Yet the
organizational roots of innovation remain unclear, in part because of the
limitations of current data. This paper introduces the international
Genetically Engineered Machine (iGEM) competition, a model system for studying
team science and innovation. By combining digital laboratory notebooks with
performance data from 2,406 teams over multiple years of participation, we
reveal shared dynamical and organizational patterns across teams and identify
features associated with team performance and success. This dataset makes
visible organizational behavior that is typically hidden, and thus
understudied, creating new opportunities for the science of science and
innovation.Comment: 78 pages including SI, 7 figures, 18 SI figure
Collaboration and Performance of Citizen Science Projects Addressing the Sustainable Development Goals
Measuring the progress towards the Sustainable Development Goals (SDGs) requires the collection of relevant and reliable data. To do so, Citizen Science can provide an essential source of non-traditional data for tracking progress towards the SDGs, as well as generate social innovations that enable such progress. At its core, citizen science relies on participatory processes involving the collaboration of stakeholders with diverse standpoints, skills, and backgrounds. The ability to measure these participatory processes is therefore key for the monitoring and evaluation of citizen science projects and to support the decisions of their coordinators. Here, we show that the monitoring of social interaction networks provides unique insights on the participatory processes and outcomes of citizen science projects. We studied fourteen early-stage citizen science projects that participated in an innovation cycle focused on SDG 13, Climate Action, as part of the Crowd4SDG project. We implemented a monitoring strategy to measure the collaborative profiles of citizen science teams. This allowed us to generate dynamic interaction networks across complementary dimensions, making visible both formal and informal interactions associated with the division of labor, collaborations, advice seeking, and communication processes of the projects during their development. Leveraging jury evaluation data, we showed that while team composition and communication are associated with project quality, measures of collaboration and activity are associated with engagement quality. Overall, monitoring social interaction dynamics helps build a more comprehensive picture of participatory processes, which is of importance for guiding citizen science projects and for designing initiatives leveraging citizen science to address the SDGs
Inducing social selfâsorting in organic cages to tune the shape of the internal cavity
Many interesting target guest molecules have low symmetry, yet most methods for synthesising hosts result in highly symmetrical capsules. Methods of generating lower symmetry pores are thus required to maximise the binding affinity in hostâguest complexes. Herein, we use mixtures of tetraaldehyde building blocks with cyclohexanediamine to access low-symmetry imine cages. Whether a low-energy cage is isolated can be correctly predicted from the thermodynamic preference observed in computational models. The stability of the observed structures depends on the geometrical match of the aldehyde building blocks. One bent aldehyde stands out as unable to assemble into high-symmetry cages-and the same aldehyde generates low-symmetry socially self-sorted cages when combined with a linear aldehyde. We exploit this finding to synthesise a family of low-symmetry cages containing heteroatoms, illustrating that pores of varying geometries and surface chemistries may be reliably accessed through computational prediction and self-sorting
Analyse computationnelle des éléments cis-régulateurs dans les génomes des drosophiles et des mammifÚres
Cellular differentiation and tissue specification depend in part on the establishment of specific transcriptional programs of gene expression. These programs result from the interpretation of genomic regulatory information by sequence-specific transcription factors (TFs). Decoding this information in sequenced genomes is a key issue. In a first part, we study the interaction between the TFs and the DNA sequences they bind to, called Transcription Factor Binding Sites (TFBSs). Using a Potts model inspired from spin glass physics along with high-throughput binding data for a variety of Drosophilae and mammalian TFs, we show that TFBSs exhibit correlations among nucleotides and that the account of their contribution in the binding energy greatly improves the predictability of genomic TFBSs. Then, we present Imogene, an extension to mammalian genomes of a Bayesian, phylogeny-based algorithm designed to computationally identify the Cis-Regulatory Modules (CRMs) that control gene expression in a set of co-regulated genes, and that was previously applied to Drosophila regulation. Starting with a small number of CRMs in a reference species as a training set, but with no a priori knowledge of the factors acting in trans, the algorithm uses the over-representation and conservation of TFBSs among related species to predict putative regulatory elements along with genomic CRMs underlying co-regulation. We present several applications of this algorithm both in Drosophila and vertebrates. We also present an extension of the algorithm to the case of pattern recognition, showing that CRMs with different patterns of expression can be distinguished on the sole basis of their DNA motifs content. Finally, we present applications of these modeling tools to real biological cases : the trichomes differentiation in Drosophila, and the skeletal muscle differentiation in the mouse. In both cases, predictions were experimentally validated in a joint work with biological teams, and point towards a great flexibility of the cis-regulatory processes.La différenciation cellulaire et la spécification des tissus biologiques dépendent en partie de l'établissement de programmes d'expression génétique caractéristiques. Ces programmes sont le résultat de l'interprétation de l'information génomique par des Facteurs de Transcription (TFs) se fixant à des séquences d'ADN spécifiques. Décoder cette information dans les génomes séquencés est donc un enjeu majeur. Dans une premiÚre partie, nous étudions l'interaction entre les TFs et leurs sites de fixation sur l'ADN. L'utilisation d'un modÚle de Potts inspiré de la physique des verres de spin et de données de fixation à grande échelle pour plusieurs TFs de la drosophile et des mammifÚres permet de montrer que les sites de fixation exhibent des corrélations entre nucléotides. Leur prise en compte permet d'améliorer significativement la prédiction des sites de fixations sur le génome. Nous présentons ensuite Imogene, l'extension au cas des mammifÚres d'un algorithme bayésien utilisant la phylogénie afin d'identifier les motifs et modules de cis-régulation (CRMs) contrÎlant l'expression d'un ensemble de gÚnes co-régulés, qui a précédemment été appliqué au cas de la régulation chez les drosophiles. Partant d'un ensemble d'apprentissage constitué d'un petit nombre de CRMs chez une espÚce de référence, et sans connaissance a priori des TFs s'y fixant, l'algorithme utilise la sur-représentation et la conservation des sites de fixation chez des espÚces proches pour prédire des régulateurs putatifs ainsi que les CRMs génomiques sous-tendant la co-régulation. Nous montrons en particulier qu'Imogene peut distinguer des modules de régulation conduisant à différents motifs d'expression génétique sur la seule base de leur séquence ADN. Enfin, nous présentons des applications de ces outils de modélisation à des cas biologiques réels : la différenciation des trichomes chez la drosophile, et la différenciation musculaire chez la souris. Dans les deux cas, les prédictions ont été validées expérimentalement en collaboration avec des équipes de biologistes, et pointent vers une grande flexibilité des processus de cis-régulation.
- âŠ